Soft Clustering Criterion Functions for Partitional Document Clustering
نویسندگان
چکیده
Recently published studies have shown that partitional clustering algorithms that optimize certain criterion functions, which measure key aspects of interand intra-cluster similarity, are very effective in producing hard clustering solutions for document datasets and outperform traditional partitional and agglomerative algorithms. In this paper we study the extent to which these criterion functions can be modified to include soft membership functions and whether or not the resulting soft clustering algorithms can further improve the clustering solutions. Specifically, we focus on four of these hard criterion functions, derive their soft-clustering extensions, present a comprehensive experimental evaluation involving twelve different datasets, and analyze their overall characteristics. Our results show that introducing softness into the criterion functions tends to lead to better clustering results for most datasets and consistently improve the separation between the clusters.
منابع مشابه
Comparison of Agglomerative and Partitional Document Clustering Algorithms
Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters, and in greatly improving the retrieval performance either via cluster-driven dimensionality reduction, term-weighting, or query expansion. This ever-increasing importance of do...
متن کاملDepartment of Computer Science and Engineering University of Minnesota 4 - 192 EECS Building 200 Union Street SE Minneapolis , MN 55455 - 0159 USA TR 04 - 021 gCLUTO – An Interactive Clustering , Visualization , and Analysis System
Recently published studies have shown that partitional clustering algorithms that optimize certain criterion functions, which measure key aspects of interand intra-cluster similarity, are very effective in producing hard clustering solutions for document datasets and outperform traditional partitional and agglomerative algorithms. In this paper we study the extent to which these criterion funct...
متن کاملPartitional Clustering Experiments on Document Datasets
The purpose of this study is evaluation and comparison of some criterion functions used for document clustering. Each function is evaluated by using different clustering methods and different datasets. Detailed experiments show that some clustering criterion functions perform better than rest. Results of experiments are also consistent with previous works which compares same criterion functions.
متن کاملHierarchical Clustering in Medical Document Collections: the BIC-Means Method
Hierarchical clustering of text collections is a key problem in document management and retrieval. In partitional hierarchical clustering, which is more efficient than its agglomerative counterpart, the entire collection is split into clusters and the individual clusters are further split until a heuristically-motivated termination criterion is met. In this paper, we define the BIC-means algori...
متن کاملA Comparison of Two Document Clustering Approaches for Clustering Medical Documents
form of medical reports. Such documents contain important information about patients, disease progression and management, but are difficult to analyse with conventional data mining techniques due to their unstructured nature. Clustering the medical documents into small number of meaningful clusters may facilitate discovering patterns by allowing us to extract a number of relevant features from ...
متن کامل